If we are successful, you should be able to hit the ground running on your own project with R
Install R from CRAN
Install RStudio from RStudio
Install the tidyverse, lubridate, and ggmap packages
install.packages("tidyverse", "lubridate", "ggmap")
#you will see activity in the console as the packages are installed
Create a folder called “R workshop”
Download the 311 data from the WPRDC
Move that CSV into the “R workshop” folder
R is an interpreted programming language for statistics
Integrated Development Environment for R
1
## [1] 1
1 + 2
## [1] 3
10 / 2
## [1] 5
5 * 2
## [1] 10
"this is a string. strings in R are surrounded by quotation marks."
## [1] "this is a string. strings in R are surrounded by quotation marks."
Type matters
"1" + 1
## Error in "1" + 1: non-numeric argument to binary operator
str(1)
## num 1
str("1")
## chr "1"
Reminder that objects are shown in the Environment panel
x
## Error in eval(expr, envir, enclos): object 'x' not found
x <- 1
x
## [1] 1
You can overwrite (or update) an object
x <- 2
x
## [1] 2
x <- 1
y <- 5
x + y
## [1] 6
c() means “concatenate”. It creates vectors
a <- c(x, y)
a
## [1] 1 5
z <- sum(a)
z
## [1] 6
my_df <- data.frame(a = 1:5,
b = 6:10,
c = c("a", "b", "c", "d", "e"))
my_df
## a b c
## 1 1 6 a
## 2 2 7 b
## 3 3 8 c
## 4 4 9 d
## 5 5 10 e
Select individual columns in a dataframe with the $ operator
my_df$a
## [1] 1 2 3 4 5
“<-” and “=” do the same thing. To minimize confusion, many people use “<-” for objects and “=” for assigning variables within functions or dataframes
x <- 1
a <- data.frame(a = 1:5,
b = 6:10)
a
## a b
## 1 1 6
## 2 2 7
## 3 3 8
## 4 4 9
## 5 5 10
“x == y” means “is x equal to y?”
1 == 2
## [1] FALSE
“!” means “not”
!FALSE
## [1] TRUE
TRUE = 1, FALSE = 0
TRUE + FALSE
## [1] 1
TRUE + TRUE
## [1] 2
R is case-sensitive
"a" == "A"
## [1] FALSE
library(package_name)
You have to load your packages each time you start R
#start a line of code with a "#" to make that line a comment
#1 + 1
#code that is "commented out" will not be executed
Use the built-in documentation. Put a “?” before the name of a function to access the documentation in the Help panel
?mean
## starting httpd help server ... done
How to set up the working directory
getwd()
## [1] "C:/Users/conor/githubfolder/pittsburgh_311/code_for_pittsburgh_presentation"
Session menu -> Set working directory -> choose your folder
setwd()
R separates the data from the analysis. The data is stored in files (CSV, JSON, etc). The analysis is stored in scripts. This makes it easier to share analysis performed in R. No need to take screenshots of your workflow in Excel. You have a record of everything that was done to the data.
A group of R packages that use a common grammar for wranging, analyzing, modeling, and graphing data
read_csv() reads CSV files from your working directory
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1.9000 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.8.0 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.2.0
## -- Conflicts ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
#df <- read_csv("your_file_name_here.csv")
df <- read_csv("https://raw.githubusercontent.com/conorotompkins/pittsburgh_311/master/data/pittsburgh_311_2018_04_10.csv")
## Parsed with column specification:
## cols(
## `_id` = col_integer(),
## REQUEST_ID = col_integer(),
## CREATED_ON = col_datetime(format = ""),
## REQUEST_TYPE = col_character(),
## REQUEST_ORIGIN = col_character(),
## STATUS = col_integer(),
## DEPARTMENT = col_character(),
## NEIGHBORHOOD = col_character(),
## COUNCIL_DISTRICT = col_integer(),
## WARD = col_integer(),
## TRACT = col_double(),
## PUBLIC_WORKS_DIVISION = col_integer(),
## PLI_DIVISION = col_integer(),
## POLICE_ZONE = col_integer(),
## FIRE_ZONE = col_character(),
## X = col_double(),
## Y = col_double(),
## GEO_ACCURACY = col_character()
## )
colnames(df) <- tolower(colnames(df)) #make all the column names lowercase. this is a personal preference
#initial data munging to get the dates in shape
df %>%
mutate(date = ymd(str_sub(created_on, 1, 10)),
time = hms(str_sub(created_on, 11, 18)),
month = month(date, label = TRUE),
year = year(date),
yday = yday(date)) -> df
Explore the data
df #simply type the name of the object to preview it
## # A tibble: 225,189 x 23
## `_id` request_id created_on request_type request_origin
## <int> <int> <dttm> <chr> <chr>
## 1 154245 54111 2016-03-10 13:52:00 Rodent control Call Center
## 2 154246 53833 2016-03-09 14:22:00 Rodent control Call Center
## 3 154247 52574 2016-03-03 07:13:00 Potholes Call Center
## 4 154248 54293 2016-03-11 10:12:00 Building Without ~ Control Panel
## 5 154249 53560 2016-03-08 14:57:00 Potholes Call Center
## 6 154250 49519 2016-02-22 09:10:00 Potholes Call Center
## 7 154251 49484 2016-02-22 08:03:00 Potholes Call Center
## 8 154252 53787 2016-03-09 12:21:00 Rodent control Call Center
## 9 154253 52887 2016-03-04 12:49:00 Potholes Call Center
## 10 154254 53599 2016-03-08 16:03:00 Rodent control Call Center
## # ... with 225,179 more rows, and 18 more variables: status <int>,
## # department <chr>, neighborhood <chr>, council_district <int>,
## # ward <int>, tract <dbl>, public_works_division <int>,
## # pli_division <int>, police_zone <int>, fire_zone <chr>, x <dbl>,
## # y <dbl>, geo_accuracy <chr>, date <date>, time <S4: Period>,
## # month <ord>, year <dbl>, yday <dbl>
glimpse(df) #get a summary of the dataframe
## Observations: 225,189
## Variables: 23
## $ `_id` <int> 154245, 154246, 154247, 154248, 154249, ...
## $ request_id <int> 54111, 53833, 52574, 54293, 53560, 49519...
## $ created_on <dttm> 2016-03-10 13:52:00, 2016-03-09 14:22:0...
## $ request_type <chr> "Rodent control", "Rodent control", "Pot...
## $ request_origin <chr> "Call Center", "Call Center", "Call Cent...
## $ status <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1...
## $ department <chr> "Animal Care & Control", "Animal Care & ...
## $ neighborhood <chr> "Middle Hill", "Squirrel Hill North", "L...
## $ council_district <int> 6, 8, 9, NA, 9, 9, 9, 3, 9, 1, 4, 4, 9, ...
## $ ward <int> 5, 14, 12, NA, 13, 13, 13, 16, 13, 23, 1...
## $ tract <dbl> 42003050100, 42003140300, 42003120800, N...
## $ public_works_division <int> 3, 3, 2, NA, 2, 2, 2, 4, 2, 1, 4, 4, 2, ...
## $ pli_division <int> 5, 14, 12, NA, 13, 13, 13, 16, 13, 23, 1...
## $ police_zone <int> 2, 4, 5, NA, 5, 5, 5, 3, 5, 1, 6, 3, 5, ...
## $ fire_zone <chr> "2-1", "2-18", "3-12", NA, "3-17", "3-17...
## $ x <dbl> -79.97765, -79.92450, -79.91455, NA, -79...
## $ y <dbl> 40.44579, 40.43986, 40.46527, NA, 40.459...
## $ geo_accuracy <chr> "APPROXIMATE", "APPROXIMATE", "EXACT", "...
## $ date <date> 2016-03-10, 2016-03-09, 2016-03-03, 201...
## $ time <S4: Period> 13H 52M 0S, 14H 22M 0S, 7H 13M 0S...
## $ month <ord> Mar, Mar, Mar, Mar, Mar, Feb, Feb, Mar, ...
## $ year <dbl> 2016, 2016, 2016, 2016, 2016, 2016, 2016...
## $ yday <dbl> 70, 69, 63, 71, 68, 53, 53, 69, 64, 68, ...
%>% means “and then”
%>% passes the dataframe to the next function
df %>% #select the dataframe
select(date, request_type) #select the date and request_type columns
## # A tibble: 225,189 x 2
## date request_type
## <date> <chr>
## 1 2016-03-10 Rodent control
## 2 2016-03-09 Rodent control
## 3 2016-03-03 Potholes
## 4 2016-03-11 Building Without a Permit
## 5 2016-03-08 Potholes
## 6 2016-02-22 Potholes
## 7 2016-02-22 Potholes
## 8 2016-03-09 Rodent control
## 9 2016-03-04 Potholes
## 10 2016-03-08 Rodent control
## # ... with 225,179 more rows
df %>%
select(date, request_type) %>%
filter(request_type == "Potholes") #use the string "Potholes" to filter the dataframe
## # A tibble: 31,735 x 2
## date request_type
## <date> <chr>
## 1 2016-03-03 Potholes
## 2 2016-03-08 Potholes
## 3 2016-02-22 Potholes
## 4 2016-02-22 Potholes
## 5 2016-03-04 Potholes
## 6 2016-03-11 Potholes
## 7 2016-03-08 Potholes
## 8 2016-03-08 Potholes
## 9 2016-03-08 Potholes
## 10 2016-03-08 Potholes
## # ... with 31,725 more rows
df %>%
select(date, request_type) %>%
filter(request_type == "Potholes") %>%
mutate(weekday = wday(date, label = TRUE))
## # A tibble: 31,735 x 3
## date request_type weekday
## <date> <chr> <ord>
## 1 2016-03-03 Potholes Thu
## 2 2016-03-08 Potholes Tue
## 3 2016-02-22 Potholes Mon
## 4 2016-02-22 Potholes Mon
## 5 2016-03-04 Potholes Fri
## 6 2016-03-11 Potholes Fri
## 7 2016-03-08 Potholes Tue
## 8 2016-03-08 Potholes Tue
## 9 2016-03-08 Potholes Tue
## 10 2016-03-08 Potholes Tue
## # ... with 31,725 more rows
(df %>%
select(date, request_type) %>% #select columns
filter(request_type == "Potholes") %>% #filter by "Potholes"
mutate(month = month(date, label = TRUE)) %>% #add month column
group_by(request_type, month) %>% #group by the unqiue request_type values and month values
summarize(count = n()) %>% #summarize to count the number of rows in each combination of request_type and month
arrange(desc(count)) -> df_potholes_month) #arrange the rows by the number of requests
## # A tibble: 12 x 3
## # Groups: request_type [1]
## request_type month count
## <chr> <ord> <int>
## 1 Potholes Feb 5569
## 2 Potholes Mar 3961
## 3 Potholes Apr 3873
## 4 Potholes May 3388
## 5 Potholes Jan 3089
## 6 Potholes Jun 2896
## 7 Potholes Jul 2688
## 8 Potholes Aug 1913
## 9 Potholes Nov 1344
## 10 Potholes Sep 1260
## 11 Potholes Oct 1113
## 12 Potholes Dec 641
Put your code in parentheses to execute it AND print the output in the console
ggplot(data = _ , aes(x = _, y = _)) +
geom_()
Pipe your data directly into ggplot2
some_dataframe %>%
ggplot(data = _ , aes(x = _, y = _)) +
geom_()
Graph the number of pothole requests per month
df_potholes_month %>%
ggplot(aes(x = month, y = count)) + #put the month column on the x axis, count on the y axis
geom_col() #graph the data with columns
Make it pretty. Add a title, subtitle, axes labels, captions, and themes
df_potholes_month %>%
ggplot(aes(month, count)) +
geom_col() +
labs(title = "Pothole requests to Pittsburgh 311",
x = "",
y = "Number of requests",
caption = "Source: Western Pennsylvania Regional Datacenter") +
theme_bw()
Make a line graph of the number of pothole requests in the dataset by date
df %>%
filter(request_type == "Potholes") %>%
count(date) #group_by and summarize the number of rows per date
## # A tibble: 983 x 2
## date n
## <date> <int>
## 1 2015-04-20 119
## 2 2015-04-21 101
## 3 2015-04-22 109
## 4 2015-04-23 102
## 5 2015-04-24 84
## 6 2015-04-27 85
## 7 2015-04-28 101
## 8 2015-04-29 107
## 9 2015-04-30 83
## 10 2015-05-01 66
## # ... with 973 more rows
#assign labels to objects to save some typing
my_title <- "Pothole requests to Pittsburgh 311"
my_caption <- "Source: Western Pennsylvania Regional Datacenter"
df %>%
filter(request_type == "Potholes") %>%
count(date) %>%
ggplot(aes(date, n)) +
geom_line() + #use a line to graph the data
labs(title = my_title, #use the object you created earlier
x = "",
y = "Number of requests",
caption = my_caption) + #use the object you created earlier
theme_bw(base_family = 18) #base_family modifies the size of the font
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Note that ggplot2 automatically formats the axis labels for dates
Graph the data by number of requests per day of the year
(df %>%
select(request_type, date) %>%
filter(request_type == "Potholes") %>%
mutate(year = year(date), #create a year column
yday = yday(date)) %>% #create a day of the year column
count(year, yday) -> df_day_of_year) #shortcut for group_by + summarize for counting. returns "n"
## # A tibble: 983 x 3
## year yday n
## <dbl> <dbl> <int>
## 1 2015 110 119
## 2 2015 111 101
## 3 2015 112 109
## 4 2015 113 102
## 5 2015 114 84
## 6 2015 117 85
## 7 2015 118 101
## 8 2015 119 107
## 9 2015 120 83
## 10 2015 121 66
## # ... with 973 more rows
df_day_of_year %>%
ggplot(aes(yday, n, group = year)) + #color the lines by year. #as.factor() turns the year column from integer to factor (ordinal string)
geom_line() +
labs(title = my_title,
x = "Day of the year",
y = "Number of requests",
caption = my_caption) +
theme_bw(base_family = 18)
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
That plotted a line for each year, but there is no way to tell which line corresponds with which year
Color the lines by the year
df_day_of_year %>%
ggplot(aes(yday, n, color = as.factor(year))) + #color the lines by year. #as.factor() turns the year column from integer to factor (ordinal string)
geom_line() +
labs(title = my_title,
x = "Day of the year",
y = "Number of requests",
caption = my_caption) +
theme_bw(base_family = 18)
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Graph the cumulative sum of pothole requests per year
(df %>%
select(request_type, date) %>%
filter(request_type == "Potholes") %>%
mutate(year = year(date),
yday = yday(date)) %>%
arrange(date) %>% #always arrange your data for cumulative sums
group_by(year, yday) %>%
summarize(n = n()) %>%
ungroup() %>%
group_by(year) %>%
mutate(cumsum = cumsum(n)) -> df_cumulative_sum) #calculate the cumulative sum per year
## # A tibble: 983 x 4
## # Groups: year [4]
## year yday n cumsum
## <dbl> <dbl> <int> <int>
## 1 2015 110 119 119
## 2 2015 111 101 220
## 3 2015 112 109 329
## 4 2015 113 102 431
## 5 2015 114 84 515
## 6 2015 117 85 600
## 7 2015 118 101 701
## 8 2015 119 107 808
## 9 2015 120 83 891
## 10 2015 121 66 957
## # ... with 973 more rows
df_cumulative_sum %>%
ggplot(aes(yday, cumsum, color = as.factor(year))) +
geom_line(size = 2) +
labs(title = my_title,
x = "Day of the year",
y = "Cumulative sum of requests",
caption = my_caption) +
scale_color_discrete("Year") +
theme_bw(base_family = 18)
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
Since 2015 and 2018 have incomplete data, filter them out
df %>%
filter(date >= "2016-01-01",
date <= "2018-01-01") -> df_filtered
df_filtered %>%
count(request_type, sort = TRUE) %>%
top_n(5) %>% #select the top 5 request types
ungroup() -> df_top_requests
df_filtered %>%
semi_join(df_top_requests) %>% #joins are ways to combine two dataframes
count(request_type, month) %>%
ggplot(aes(month, n, group = request_type, fill = request_type)) +
geom_area() +
scale_fill_discrete("Request type") + #change the name of the color legend
scale_y_continuous(expand = c(0, 0)) + #remove the padding around the edges
scale_x_discrete(expand = c(0, 0)) +
labs(title = "Top 5 types of 311 requests in Pittsburgh",
subtitle = "2016 to 2017",
x = "",
y = "Number of requests",
caption = my_caption) +
theme_bw(base_family = 18) +
theme(panel.grid = element_blank()) #remove the gridlines fom the plot
Load the ggmap package, which works with ggplot2
library(ggmap)
Select the request_type, x, and y columns. x and y are longitude and latitude
(df %>%
select(request_type, x, y) %>%
filter(!is.na(x), !is.na(y),
request_type == "Potholes") -> df_map) #remove missing x and y values
## # A tibble: 31,735 x 3
## request_type x y
## <chr> <dbl> <dbl>
## 1 Potholes -79.9 40.5
## 2 Potholes -79.9 40.5
## 3 Potholes -79.9 40.5
## 4 Potholes -79.9 40.5
## 5 Potholes -79.9 40.5
## 6 Potholes -80.0 40.4
## 7 Potholes -79.9 40.5
## 8 Potholes -79.9 40.5
## 9 Potholes -79.9 40.5
## 10 Potholes -79.9 40.5
## # ... with 31,725 more rows
city_map <- get_map("North Oakland, Pittsburgh, PA",
zoom = 12,
maptype = "toner",
source = "stamen")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=North+Oakland,+Pittsburgh,+PA&zoom=12&size=640x640&scale=2&maptype=terrain&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=North%20Oakland,%20Pittsburgh,%20PA&sensor=false
## Map from URL : http://tile.stamen.com/toner/12/1137/1542.png
## Map from URL : http://tile.stamen.com/toner/12/1138/1542.png
## Map from URL : http://tile.stamen.com/toner/12/1139/1542.png
## Map from URL : http://tile.stamen.com/toner/12/1137/1543.png
## Map from URL : http://tile.stamen.com/toner/12/1138/1543.png
## Map from URL : http://tile.stamen.com/toner/12/1139/1543.png
## Map from URL : http://tile.stamen.com/toner/12/1137/1544.png
## Map from URL : http://tile.stamen.com/toner/12/1138/1544.png
## Map from URL : http://tile.stamen.com/toner/12/1139/1544.png
## Map from URL : http://tile.stamen.com/toner/12/1137/1545.png
## Map from URL : http://tile.stamen.com/toner/12/1138/1545.png
## Map from URL : http://tile.stamen.com/toner/12/1139/1545.png
(city_map <- ggmap(city_map))
Put the data on the map
city_map +
geom_point(data = df_map, aes(x, y, color = request_type)) #graph the data with dots
## Warning: Removed 729 rows containing missing values (geom_point).
There is too much data on the graph. Make the dots more transparent to show density
city_map +
geom_point(data = df_map, aes(x, y, color = request_type), alpha = .1) #graph the data with dots
## Warning: Removed 729 rows containing missing values (geom_point).
Still not great. Density plots are better for showing overplotted data
#Put the data on the map
city_map +
stat_density_2d(data = df_map, #Using a 2d density contour
aes(x, #longitude
y, #latitude,
fill = request_type,
alpha = ..level..), #Use alpha so you can see the map under the data
geom = "polygon") + #We want the contour in a polygon
scale_alpha_continuous(range = c(.1, 1)) + #manually set the range for the alpha
guides(alpha = guide_legend("Number of requests"),
fill = FALSE) +
labs(title = "Pothole requests in Pittsburgh",
subtitle = "311 data",
x = "",
y = "",
caption = my_caption) +
theme_bw(base_family = 18) +
theme(axis.text = element_blank())
## Warning: Removed 729 rows containing non-finite values (stat_density2d).
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x
## $y, : font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database
## Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
## font family not found in Windows font database